where \(\boldsymbol{w}\) is a normalized spatial weight matrix.
Some properties of Moran’s I when there is no spatial autocorrelation / dependence:
\(E(I) = -1 / (n-1)\)
\(Var(I) = (\text{Something ugly but closed form})- E(I)^2\)
\(\underset{n\to\infty}{\lim} \frac{I - E(I)}{\sqrt{Var(I)}} \sim N(0,1)\) (via the CLT)
NC SIDS & Moran’s I
Lets start by using a normalized spatial weight matrix for \(\boldsymbol{w}\) (basedd on shared county borders).
morans_I =function(y, w) {# w here can be either the adjacency matrix or the normalized weight matrix w =normalize_weights(w) n =length(y) num =sum(w * (y-mean(y)) %*%t(y-mean(y))) denom =sum( (y-mean(y))^2 ) (n/sum(w)) * (num/denom)}A =1*st_touches(nc, sparse=FALSE)morans_I(y = nc$SID74, A)
where \(\boldsymbol{w}\) is a normalized spatial weights matrix.
Some properties of Geary’s C:
\(0 < C < 2\)
If \(C \approx 1\) then no spatial autocorrelation
If \(C > 1\) then negative spatial autocorrelation
If \(C < 1\) then positive spatial autocorrelation
Geary’s C is inversely related to Moran’s I
NC SIDS & Geary’s C
Again using an adjacency matrix for \(\boldsymbol{w}\) (shared county borders).
gearys_C =function(y, w) {# w here can be either the adjacency matrix or the normalized weight matrix w =normalize_weights(w) n =length(y) y_i = y %*%t(rep(1,n)) y_j =t(y_i) ((n-1)/(2*sum(w))) * (sum(w * (y_i-y_j)^2) /sum( (y -mean(y))^2 ))}A =1*st_touches(nc, sparse=FALSE)gearys_C(y = nc$SID74, w = A)
[1] 0.8438767
Spatial Correlogram
Rather than using the touches predicate to determine adjacency, we can alternatively use a distance based approach.
In this case we will define the elements of the adjacency matrix as
\[
\{\boldsymbol A\}_{ij} = \mathbb1_{\{\text{distance between } s_i \text{ and } s_j \text{ is less than } d\}}
\]
We can then construct a correlogram by varying \(d\) and then calculating Moran’s I and Geary’s C for each \(\boldsymbol A\).
Instead we need to think about things in terms of their neighbors / neighborhoods. We define \(N(s_i)\) to be the set of neighbors of location \(s_i\).
If we define the neighborhood based on “touching” then \(N(s_3) = \{s_2, s_4\}\)
If we include 2nd order neighbors, then \(N(s_3) = \{s_1,s_2,s_3,s_4\}\)
Defining the Spatial AR model
Here we will consider a simple average of neighboring observations, just like with the temporal AR model we have two options in terms of defining the autoregressive process,
Using \[ y(s) = \phi \frac{1}{|N(s)|}\sum_{s' \in N(s)} y(s') + N(0,\sigma^2) \] we want to find the distribution of \(\boldsymbol{y} = \Big(y(s_1),\, y(s_2),\,\ldots,\,y(s_n)\Big)^t\).
First we can define a weight matrix \(\boldsymbol{W}\) where \[
\{\boldsymbol{W}\}_{ij} = \begin{cases}
1/|N(s_i)| & \text{if $j \in N(s_i)$} \\
0 & \text{otherwise}
\end{cases}
\] then we can write \(\boldsymbol{y}\) as follows, \[ \boldsymbol{y} = \phi \, \boldsymbol{W} \, \boldsymbol{y} + \boldsymbol{\epsilon} \] where \[ \boldsymbol{\epsilon} \sim N(0,\sigma^2 \, \boldsymbol{I}) \]
This is a bit trickier, in the case of the temporal AR process we actually went from joint distribution \(\to\) conditional distributions (which we were then able to simplify).
Since we don’t have a natural ordering we can’t get away with this (at least not easily).
Going the other way, conditional distributions \(\to\) joint distribution is difficult because it is possible to specify conditional distributions that lead to an improper joint distribution.
Brooks’ Lemma
For sets of observations \(\boldsymbol{x}\) and \(\boldsymbol{y}\) where \(p(x) > 0~~\forall ~ x\in\boldsymbol{x}\) and \(p(y) > 0~~\forall ~ y\in\boldsymbol{y}\) then
Adopting different weight matrices \((\boldsymbol{W})\)
Between SAR and CAR model we move to a generic weight matrix definition (beyond average of nearest neighbors)
In time we varied \(p\) in the \(AR(p)\) model, in space we adjust the weight matrix.
In general having a symmetric W is helpful, but not required
More complex Variance (beyond \(\sigma^2 \, I\))
\(\sigma^2\) can be a vector (differences between areal locations)
i.e. since areal data tends to be aggregated - adjust variance based on sample size
i.e. scale based on the number of neighbors
Some specific generalizations
Generally speaking we want to work with a scaled / normalized version of the weight matrix, \[ W_{ij}/W_{i\boldsymbol{\cdot}} \]
When \(W\) is derived from an adjacency matrix, \(\boldsymbol{A}\), we can express this as \[ \boldsymbol{W} = \boldsymbol{D}^{-1} \boldsymbol{A} \] where \(\boldsymbol{D}^{-1} = \text{diag}(1/|N(s_i)|)\).
We can also allow \(\sigma^2\) to vary between locations, we can define this as \(\boldsymbol{D}_{\sigma^2} = \text{diag}(\sigma^2_i)\) and most often we use \[ \boldsymbol{D}_{\sigma^2}^{-1} = \text{diag}\left(\frac{\sigma^2}{|N(s_i)|}\right) = \sigma^2 \boldsymbol{D}^{-1}. \]
Error in solve.default(Sigma_inv): Lapack routine dgesv: system is exactly singular: U[3,3] = 0
check_sigma_pd(phi=-1)
Error in solve.default(Sigma_inv): Lapack routine dgesv: system is exactly singular: U[3,3] = 0
check_sigma_pd(phi=1.2)
Error in chol.default(solve(Sigma_inv)): the leading minor of order 1 is not positive
check_sigma_pd(phi=-1.2)
Error in chol.default(solve(Sigma_inv)): the leading minor of order 1 is not positive
Conclusions
Generally speaking just like the AR(1) model for time series we require that \(|\phi| < 1\) for the CAR model to be proper.
These results for \(\phi\) also apply in the context where \(\sigma^2_i\) is constant across locations, i.e. \[
\boldsymbol{\Sigma} = \big(\sigma^2 \, (\boldsymbol{I}-\phi \boldsymbol{D}^{-1}\boldsymbol{A})\big)^{-1}
\]
As a side note, the special case where \(\phi=1\) is known as an intrinsic autoregressive (IAR) model and they are popular as an improper prior for spatial random effects. An additional sum constraint is necessary for identifiability \[\sum_{i=1}^n y(s_i) = 0\]